On Cluster Validity and the Information Need of Users

نویسندگان

  • Benno Stein
  • Sven Meyer
  • Frank Wißbrock
چکیده

In the field of information retrieval, clustering algorithms are used to analyze large collections of documents with the objective to form groups of similar documents. Clustering a document collection is an ambiguous task: A clustering, i. e. a set of document groups, depends on the chosen clustering algorithm as well as on the algorithm’s parameter settings. To find the best among several clusterings, it is common practice to evaluate their internal structures with a cluster validity measure. A clustering is considered to be useful to a user if particular structural properties are well developed. Nevertheless, the presence of certain structural properties may not guarantee usefulness from an information retrieval standpoint, say, whether or not the found document groups resemble the classification of a human editor. The paper in hand investigates this point: Based on already classified document collections we generate clusterings and compare the predicted quality to their real quality. Our analysis includes the classical cluster validity measures from Dunn and Davies-Bouldin as well as the new graph-based measures Λ (weighted edge connectivity) and ρ (expected edge density). The experiments show interesting results: The classical measures behave in a consistent manner insofar as mediocre and poor clusterings are identified as such. On real-world document clustering data, however, they are definitely outperformed by the expected edge density ρ. This superiority of the graph-based measures can be explained by their independence of cluster forms and distances.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of parameters affecting the success of the hospital information system & presentation of a model for user satisfaction improvement

Complex institutions comprising several divisions and departments such as hospitals need access to information. Hospital information system has many capabilities and in case this system is acceptance by hospital staff, it leads to a revolution in the health care delivery industry. The identification of effective determinants and measures on the success of hospital information systems could sign...

متن کامل

Prediction of user's trustworthiness in web-based social networks via text mining

In Social networks, users need a proper estimation of trust in others to be able to initialize reliable relationships. Some trust evaluation mechanisms have been offered, which use direct ratings to calculate or propagate trust values. However, in some web-based social networks where users only have binary relationships, there is no direct rating available. Therefore, a new method is required t...

متن کامل

Design, Implementation and Evaluation of Software to Increase Users’ Awareness and Facilitate the Identification of the Most Appropriate Centers Providing Laboratory Services in Tehran Province

Background and Aim: Medical diagnostic laboratories are among the most important centers in the treatment cycle of patients. Today, the conscious choice of such laboratories is one of the challenges that patients face in the treatment process. This study was conducted with the aim of improving the knowledge of software users in the field of laboratory sciences and also facilitating the consciou...

متن کامل

An Investigation on the User Behavior in Social Commerce Platforms: A Text Analytics Approach

Nowadays, the tourism industry accounts for approximately 10% of the global GDP, while it only contributes 3% of the economy in Iran. Since the pressure of US sanctions increases day after day on the Iranian economy, the necessity of paying attention to this industry as a source of foreign currency is felt more than ever. The purpose of this research is to analyze the reviews of users of social...

متن کامل

The Impact of Users’ Perception of Social Responsibility on the Usage of Public Library Services and Resources with the Mediated Role of Perceived Organizational Image

Purpose: Every organization’s commitment to its social responsibilities is a way to form a positive mental image among its audience that can affect their attitude towards using of the organization’s services. In the present era, despite the various media with diverse capabilities in the field of providing and disseminating of information, the field of information is experiencing intense competi...

متن کامل

تأثیر حریم خصوصی، امنیت و اعتماد ادراک شده بر رفتار به اشتراک‌گذاری اطلاعات در شبکه‌های اجتماعی موبایل: نقش تعدیل‌کننده متغیر جنسیت

The appearance of social networks has been one of the most important events in recent decades. One of the issues raised in these networks, is how to trust. The purpose of this paper is to examine the impact of security, trust and privacy about information sharing on mobile social networks. The study also describes how users' gender moderates the privacy and security impact on trust. The current...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004